Non-linear Programming

Fig. 9 A hierarchy of convex optimization problems. (LP: linear program, QP: quadratic program, SOCP second-order cone program, SDP: semidefinite program, CP: cone program.) [Wikipedia]

TBC

Primal and Dual short intro.

Lagrange multiplier:

  • geometric motivation of the method: align tangency of the objective function and the constraints.

  • formulation of Lagrangean \(\mathcal{L}\): combining all equations to \(\nabla\mathcal{L} = 0\).

  • interpretation and proof of the Lagrange multiplier \(\lambda\) as \(\frac{\partial f}{\partial c}\), e.g. if budget change, how much will revenue change?

Rayleigh Quotients

Consider the following constrained optimization:

\[\begin{split}\begin{aligned} \max_{\boldsymbol{x}} && \boldsymbol{x} ^{\top} \boldsymbol{A} \boldsymbol{x} & \\ \mathrm{s.t.} && \left\| \boldsymbol{x} \right\|^2 &= 1 \\ \end{aligned}\end{split}\]

An equivalent unconstrained problem is

\[\begin{split}\begin{aligned} \max_{\boldsymbol{x} \ne \boldsymbol{0}} && \frac{\boldsymbol{x} ^{\top} \boldsymbol{A} \boldsymbol{x} }{\boldsymbol{x} ^{\top} \boldsymbol{x} } & \\ \end{aligned}\end{split}\]

which makes the objective function invariant to scaling of \(\boldsymbol{x}\). How do we solve this?

Definition (Quadratic forms)
Let \(\boldsymbol{A}\) be a symmetric real matrix. A quadratic form corresponding to \(\boldsymbol{A}\) is a function \(Q: \mathbb{R} ^n \rightarrow \mathbb{R}\) with

\[ Q_{\boldsymbol{A}}(\boldsymbol{x}) = \boldsymbol{x} ^{\top} \boldsymbol{A} \boldsymbol{x} \]

A quadratic form is can be written as a polynomial with terms all of second order

\[ \boldsymbol{x} ^{\top} \boldsymbol{A} \boldsymbol{x} = \sum_{i, j=1}^n a_{ij} x_i x_j \]

Definition (Rayleigh quotient)

  • For a fixed symmetric matrix \(\boldsymbol{A}\), the normalized quadratic form \(\frac{\boldsymbol{x} ^{\top} \boldsymbol{A} \boldsymbol{x}}{\boldsymbol{x} ^{\top} \boldsymbol{x} }\) is called a Rayleigh quotient.

    • In addition, given a positive definite matrix \(\boldsymbol{B}\) of the same size, the quantity \(\frac{\boldsymbol{x} ^{\top} \boldsymbol{A} \boldsymbol{x} }{\boldsymbol{x} ^{\top} \boldsymbol{B} \boldsymbol{x} }\) is called a generalized Rayleigh quotient.

Applications

  • PCA: \(\max _{\boldsymbol{v} \neq 0} \frac{\boldsymbol{v}^{\top} \boldsymbol{\Sigma} \boldsymbol{v}}{\boldsymbol{v}^{\top} \boldsymbol{v}}\) where \(\boldsymbol{\Sigma}\) is a covariance matrix

    • LDA: \(\max _{\boldsymbol{v} \neq 0} \frac{\boldsymbol{v}^{\top} \boldsymbol{S}_{b} \boldsymbol{v}}{\boldsymbol{v}^{\top} \boldsymbol{S}_{w} \boldsymbol{v}}\) where \(\boldsymbol{S} _b\) is a between-class scatter matrix, and \(\boldsymbol{S} _w\) is a within-class scatter matrix

    • Spectral clustering (relaxed Ncut): \(\max _{\boldsymbol{v} \neq \boldsymbol{0}} \frac{\boldsymbol{v}^{\top} \boldsymbol{L} \boldsymbol{v}}{\boldsymbol{v}^{\top} \boldsymbol{D} \boldsymbol{v}} \quad {s.t.} \boldsymbol{v} ^{\top} \boldsymbol{D} \boldsymbol{1} = 0\) where \(\boldsymbol{L}\) is graph Laplacian and \(\boldsymbol{D}\) is degree matrix.

Theorem (Range of Rayleigh quotients)
For any symmetric matrix \(\boldsymbol{A} \in \mathbb{R} {n \times n}\),

\[\begin{split}\begin{aligned} \max _{\boldsymbol{x} \in \mathbb{R}^{n}: \boldsymbol{x} \neq \boldsymbol{0}} \frac{\boldsymbol{x}^{\top} \boldsymbol{A} \boldsymbol{x}}{\boldsymbol{x}^{\top} \boldsymbol{x}} &=\lambda_{\max } \\ \min _{\boldsymbol{x} \in \mathbb{R}^{n}: \boldsymbol{x} \neq \boldsymbol{0}} \frac{\boldsymbol{x}^{\top} \boldsymbol{A} \boldsymbol{x}}{\boldsymbol{x}^{\top} \boldsymbol{x}} &=\lambda_{\min } \end{aligned}\end{split}\]

That is, the largest and the smallest eigenvalues of \(\boldsymbol{A}\) gives the range for the Rayleigh quotient. The maximum and the minimum is attainted when \(\boldsymbol{x}\) is the corresponding eigenvector.

In addition, if we add an orthogonal constraint that \(\boldsymbol{x}\) is orthogonal to all the \(j\) largest eigenvectors, then

\[ \max _{\boldsymbol{x} \in \mathbb{R}^{n}: \boldsymbol{x} \neq \boldsymbol{0}, \boldsymbol{x} \perp \boldsymbol{v} _1 \ldots, \boldsymbol{v} _j} \frac{\boldsymbol{x}^{\top} \boldsymbol{A} \boldsymbol{x}}{\boldsymbol{x}^{\top} \boldsymbol{x}} =\lambda_{j+1} \]

and the maximum is achieved when \(\boldsymbol{x} = \boldsymbol{v} _{j+1}\).

Corollary (Generalized Rayleigh quotient problem)
For the generalized Rayleigh quotient \(\frac{\boldsymbol{x}^{\top} \boldsymbol{A} \boldsymbol{x}}{\boldsymbol{x}^{\top} \boldsymbol{B} \boldsymbol{x}}\), the smallest and largest values \(\lambda\) satisfy

\[ \boldsymbol{A v}=\lambda \boldsymbol{B v} \quad \Longleftrightarrow \quad \boldsymbol{B}^{-1} \boldsymbol{A v}=\lambda \boldsymbol{v} \]

That is, the smallest/largest quotient value equals the smallest/largest eigenvalue of \((\boldsymbol{B} ^{-1} \boldsymbol{A})\). The left equation is called a generalized eigenvalue problem.

reference: notes

Semi-definite Programming

We introduce semi-definite programming. Then use it solve max-cut problem, and analyze its performance for min-cut problem over stochastic block model.

Introduction

A linear programming problem is one in which we wish to maximize or minimize a linear objective function of real variables over a polytope. In semidefinite programming, we instead use real-valued vectors and are allowed to take the dot product of vectors. In general, a SDP has a form

\[\begin{split} \begin{array}{l} \min _{\boldsymbol{x}_{1}, \ldots, \boldsymbol{x}_{n} \in \mathbb{R}_{n}}& \sum_{i, j \in[n]} c_{ij} \langle \boldsymbol{x}_{i}, \boldsymbol{x}_{j} \rangle \\ \text {s.t.} &\sum_{i, j \in[n]} a_{ijk}\langle \boldsymbol{x}_{i}, \boldsymbol{x}_{j} \rangle \leq b_{k} \text { for all } k \end{array} \end{split}\]

The array of real variables in LP is then replaced by an array of vector variables, which form a matrix variable. By using this notation, the problem can be written as

\[\begin{split} \begin{aligned} \min _{\boldsymbol{X} \in \mathbb{R}^{n \times n}}\ &\langle \boldsymbol{C} , \boldsymbol{X} \rangle\\ \text {s.t.}\ & \left\langle \boldsymbol{A} _{k}, \boldsymbol{X} \right\rangle\leq b_{k}, \quad k=1, \ldots, m \\ & X_{ij} = \boldsymbol{x}_i ^{\top} \boldsymbol{x} _j \end{aligned} \end{split}\]

where \(C_{ij} = (c_{ij} + c_{ji})/2, A_{ij}^{(k)} = (a_{ijk} + a_{jik})/2\) and \(\langle \boldsymbol{P}, \boldsymbol{Q} \rangle = \operatorname{tr}\left(\boldsymbol{P} ^{\top} \boldsymbol{Q} \right) = \sum_{i,j}^n p_{ij}q_{ij}\) is the Frobenius inner product.

Note that an \(n \times n\) matrix \(\boldsymbol{M}\) is said to be positive semidefinite if it is the Gramian matrix of some vectors (i.e. if there exist vectors \(\boldsymbol{v} _1, \ldots, \boldsymbol{v} _n\) such that \(M_{ij} = \langle \boldsymbol{v} _i, \boldsymbol{v} _j \rangle\) for all \(i,j\)). Hence, the last constraint is just \(\boldsymbol{X} \succeq \boldsymbol{0}\). That is, the nonnegativity constraints on real variables in LP are replaced by semidefiniteness constraints on matrix variables in SDP.

\[\begin{split} \begin{aligned} \min _{\boldsymbol{X} \succeq \boldsymbol{0}}\ &\langle \boldsymbol{C} , \boldsymbol{X} \rangle\\ \text {s.t.}\ & \left\langle \boldsymbol{A} _{k}, \boldsymbol{X} \right\rangle\leq b_{k}, \quad k=1, \ldots, m \\ \end{aligned} \end{split}\]

All linear programs can be expressed as SDPs. SDPs are in fact a special case of cone programming and can be efficiently solved by interior point methods. Given the solution \(\boldsymbol{X}^*\) to the SDP in the standard form, the vectors \(\boldsymbol{v}_1, \ldots, \boldsymbol{v} _n\) can be recovered in \(\mathcal{O} (n^3)\) time, e.g. by using an incomplete Cholesky decomposition of \(\boldsymbol{X}^* = \boldsymbol{V} ^{\top} \boldsymbol{V}\) where \(\boldsymbol{V} = [\boldsymbol{v} _1, \ldots \boldsymbol{v} _n]\).

From Rayleigh Quotient

Recall that an Rayleigh quotient can be formulated as

\[ \max\ \boldsymbol{x} ^{\top} \boldsymbol{A} \boldsymbol{x} \qquad \mathrm{s.t.}\ \boldsymbol{x} ^{\top} \boldsymbol{x} = 1 \]

Since both the objective and the constraint are scalar valued, we can re-write them using trace

\[ \max\ \operatorname{tr}\left( \boldsymbol{x} ^{\top} \boldsymbol{A} \boldsymbol{x} \right) \qquad \mathrm{s.t.}\ \operatorname{tr}\left( \boldsymbol{x} ^{\top} \boldsymbol{x} \right) = 1 \]

Then, by the property of trace, we have

\[ \max\ \operatorname{tr}\left(\boldsymbol{A} \boldsymbol{x} \boldsymbol{x} ^{\top} \right) \qquad \mathrm{s.t.}\ \operatorname{tr}\left( \boldsymbol{x} \boldsymbol{x} ^{\top} \right) = 1 \]

Let \(\boldsymbol{X} = \boldsymbol{x} \boldsymbol{x} ^{\top}\), then the variable \(\boldsymbol{x}\) can be replaced by a rank-1 matrix \(\boldsymbol{X} = \boldsymbol{x} \boldsymbol{x} ^{\top}\)

\[ \max\ \operatorname{tr}\left(\boldsymbol{A} \boldsymbol{X} \right) \qquad \mathrm{s.t.}\ \operatorname{tr}\left( \boldsymbol{X} \right) = 1, \boldsymbol{X} = x \boldsymbol{x} ^{\top} \]

Note that \(\boldsymbol{X} = \boldsymbol{x} \boldsymbol{x} ^{\top}\) if and only if \(\operatorname{rank}\left( \boldsymbol{X} \right)=1\) and \(\boldsymbol{X} \succeq \boldsymbol{0}\), hence the constraints are equivalent to.

\[ \max\ \operatorname{tr}\left(\boldsymbol{A} \boldsymbol{X} \right) \qquad \mathrm{s.t.}\ \operatorname{tr}\left( \boldsymbol{X} \right) = 1, \operatorname{rank}\left( \boldsymbol{X} \right)=1, \boldsymbol{X} \succeq 0 \qquad (RQ) \]

Call this problem RQ, if we drop the rank 1 constraint, then this would be a semidefinite program, where \(\operatorname{tr}\left( \boldsymbol{X} \right)\) can be viewed as \(\langle \boldsymbol{I} , \boldsymbol{X} \rangle = 1\).

\[ \max\ \operatorname{tr}\left(\boldsymbol{A} \boldsymbol{X} \right) \qquad \mathrm{s.t.}\ \operatorname{tr}\left( \boldsymbol{X} \right) = 1, \boldsymbol{X} \succeq 0 \qquad (SDP) \]

In fact, any optimal solution \(\boldsymbol{X} _{SDP}^*\) to this SDP can always be convert to an rank-1 matrix \(\boldsymbol{X} _{RQ}^*\) with the same optimal value. Hence, solving the SDP is equivalent to solving the RQ.

Max-cut Problem

In a graph \(G = (V, E)\) with edge weights \(\boldsymbol{W}\), we want to find a maximum bisection cut

\[ \operatorname{cut} (\boldsymbol{W}) = \max_{\boldsymbol{x} \in \left\{ \pm 1 \right\}^n} \frac{1}{2} \sum_{i,j=1}^n w_{ij} (1 - x_i x_j) \]

where the product \(x_i x_j\) is an indicator variable that equals 1 if two vertices \(i, j\) are in the same part, and \(-1\) otherwise. Hence, the summation only involves the case when \(x_i x_j = -1\), and we divide it by half since \((1- x_i x_j)=2\).

Denote \(\boldsymbol{X} = \boldsymbol{x} \boldsymbol{x} ^{\top} \in \mathbb{R} ^{n \times n}\). Then the domain \(\boldsymbol{x} \in \left\{ \pm 1 \right\}^n\) is bijective to \(\Omega = \left\{ \boldsymbol{X} \in \mathbb{R} ^{n \times n} : \boldsymbol{X} \succeq 0, X_{ii}=1, \operatorname{rank}\left( \boldsymbol{X} \right) = 1 \right\}\). The optimization problem can then be expressed as

\[\begin{split}\begin{aligned} \operatorname{cut} (\boldsymbol{W}) &= \max_{\boldsymbol{X} \in \Omega} \frac{1}{2} \operatorname{tr}\left( \boldsymbol{W} (\boldsymbol{1} \boldsymbol{1} ^{\top} - \boldsymbol{X}) \right) \\ \end{aligned}\end{split}\]

Solving this integer optimization is NP-hard. We work with relaxation of \(\Omega\).

Relaxation

Spectral Relaxation

Two relaxations:

  • drop the rank 1-constraint \(\operatorname{rank}\left( \boldsymbol{X} \right) = 1\)

  • replace the diagonal constraint \(X_{ii}=1\) by \(\operatorname{tr}\left( \boldsymbol{X} \right) = n\)

To solve it, it is equivalent to solve

\[ \min\ \operatorname{tr}\left(\boldsymbol{W}\boldsymbol{X} \right) \qquad \text{s.t. }\boldsymbol{X} \succeq \boldsymbol{0} , \operatorname{tr}\left( \boldsymbol{X} \right) = n \]

As proved above, this SDP can be solved by solving a Rayleigh quotient problem

\[ \min_{\boldsymbol{y}}\ \boldsymbol{y} ^{\top} \boldsymbol{W} \boldsymbol{y} \qquad \text{s.t.} \left\| \boldsymbol{y} \right\| = \sqrt{n} \]

The optimal solution \(\boldsymbol{y} ^*\) is the last eigenvector of \(\boldsymbol{W}\), and the objective value is the last eigenvalue of \(\boldsymbol{W}\). The solution to the SDP problem is then \(\boldsymbol{X} ^* = \boldsymbol{y}^* \boldsymbol{y} ^{*\top}\), with the same objective value. We can then round \(\boldsymbol{y} ^*\) by its sign to decide partition assignment.

SDP Relaxation

Only one relaxation: drop the rank-1 constraint, which is non-convex. The remaining two constraints forms a domain

\[\Omega_{SDP} = \left\{ \boldsymbol{X} \in \mathbb{R} ^{n \times n}: \boldsymbol{X} \succeq 0, X_{ii}=1 \right\}\]

The optimization problem is then

\[ \operatorname{SDP} (\boldsymbol{W}) = \max_{\boldsymbol{X} \in \Omega_{SDP}} \frac{1}{2} \sum_{i,j=1}^n w_{ij} (1 - X_{ij}) \]

Note that after we drop the rank-1 constraint,

  • \(\operatorname{SDP}(\boldsymbol{W} ) \ge \operatorname{cut} (\boldsymbol{W})\).

  • The solution \(\boldsymbol{X} ^*\) to \(\operatorname{SDP} (\boldsymbol{W})\) may not be rank-1. If it is rank-1 then the above equality holds.

It remains to solve \(\operatorname{SDP} (\boldsymbol{W})\) for \(\boldsymbol{X}\), and then find a way to obtain the binary partition assignment \(\hat{\boldsymbol{x}} \in \left\{ \pm 1 \right\}^n\) from \(\boldsymbol{X}\).

Random Rounding

Algorithm

  • Solve \(\operatorname{SDP} (\boldsymbol{W})\) and obtain \(\hat{\boldsymbol{X}}\)

  • Decompose \(\hat{\boldsymbol{X}} = \hat{\boldsymbol{V}} ^{\top} \boldsymbol{\hat{V}}\), e.g. using EVD \(\boldsymbol{\hat{X}} ^{\top} = \boldsymbol{\hat{U}} \sqrt{\boldsymbol{\hat{\Lambda}}}\), or using Cholesky. Note that \(\left\| \boldsymbol{\hat{v}} _i \right\| =1\) always holds, due to the constraint \(\hat{X}_{ii}=1\).

  • Sample a direction \(\boldsymbol{r}\) uniformly from \(S^{p-1}\)

  • Return binary partition assignment \(\hat{\boldsymbol{x}} = \operatorname{sign} (\boldsymbol{\hat{V}} ^{\top} \boldsymbol{r} )\)

Intuition

  • Randomly sample a hyperplane in \(\mathbb{R} ^n\) characterized by vector \(\boldsymbol{r}\). If \(\hat{\boldsymbol{v}} _i\) lies on the same side of the hyperplane with \(\boldsymbol{r}\), then set \(\hat{x}_i =1\), else \(\hat{x}_i = -1\).

  • If there indeed exists a partition \(I\) and \(J\) of vertices characterizedd by \(\boldsymbol{x}\), then the two groups of directions \(\boldsymbol{v} _i\)’s and \(\boldsymbol{v} _j\)’s should point to opposite direction since \(\boldsymbol{v} _i ^{\top} \boldsymbol{v} _j = x_i x_j = -1\). After random rounding, they should be well separated. Hence, if \(\hat{\boldsymbol{v}}_i ^{\top} \hat{\boldsymbol{v} }_j\) recovers \(\boldsymbol{v}_i ^{* \top} \boldsymbol{v}^* _j\) well enough, then \(\hat{\boldsymbol{x}}\) well recovers \(\boldsymbol{x}^*\), the optimal max-cut in \(\operatorname{cut}(\boldsymbol{W})\).

\[\begin{split}\begin{aligned} \boldsymbol{X} ^* \text{ to } \operatorname{cut} \ & \Leftrightarrow \quad \boldsymbol{x} ^* \text{ to } \operatorname{cut}\\ \text{SDP relaxation}\quad \downarrow \quad &\qquad \qquad \uparrow \text{recover} \\ \boldsymbol{\hat{X}} \text{ to } \operatorname{SDP} & \xrightarrow[\text{rounding} ]{\text{random} } \hat{\boldsymbol{x}} = \operatorname{sign} (\boldsymbol{\hat{V}} ^{\top} \boldsymbol{r} )\\ \end{aligned}\end{split}\]

To see how \(\boldsymbol{V}\) looks like for \(\boldsymbol{x} \in \left\{ \pm 1 \right\} ^n\), let \(\boldsymbol{x} = [1, 1, -1, -1, -1]\), then

\[\begin{split} \boldsymbol{X} = \left[\begin{array}{rrrrr} 1 & 1 & -1 & -1 & -1 \\ 1 & 1 & -1 & -1 & -1 \\ -1 & -1 & 1 & 1 & 1 \\ -1 & -1 & 1 & 1 & 1 \\ -1 & -1 & 1 & 1 & 1 \end{array}\right],\qquad \boldsymbol{V} = \left[ \begin{array}{rrrrr} -1 & -1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{array} \right] \end{split}\]

Analysis

How well the algorithm does, in expectation? We define Geomans-Williams quantity, which is the expected cut value returned by the algorithm, where randomness comes from random direction \(\boldsymbol{r}\).

\[\operatorname{GW} (\boldsymbol{W}) = \mathbb{E}_{\boldsymbol{r}} [f_{\boldsymbol{W} }(\hat{\boldsymbol{x}}) ] = \mathbb{E}_{\boldsymbol{r}} \left[ \frac{1}{2} \sum_{i,j=1}^n w_{ij} (1 - \hat{x}_i \hat{x}_j)\right]\]

Obviously \(\operatorname{GW} (\boldsymbol{W}) \le \operatorname{cut} (\boldsymbol{W})\), since we are averaging the value of feasible solutions \(\hat{\boldsymbol{x}}\) in \(\left\{ \pm 1\right\}^n\), and each of them is \(\le \operatorname{cut} (\boldsymbol{W})\). But how small can \(\operatorname{GW} (\boldsymbol{W})\) be?

It can be shown that \(\operatorname{GW}(\boldsymbol{W}) \ge \alpha \operatorname{cut}(\boldsymbol{W})\) where \(\alpha \approx 0.87\). That is, the random rounding algorithm return a cut value not too small than the optimal value, in expectation.

Fig. 10 Plots of \(f(y)\) and \(g(y)\)

Test plot

import plotly.io as pio
import plotly.express as px
import plotly.offline as py
pio.renderers.default = "notebook"
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species", size="sepal_length")
fig.show()

An example of random rounding is given below. Two vectors \(\boldsymbol{v}_1, \boldsymbol{v} _2\) (red, green) and random directions \(\boldsymbol{r}\) (blue) from unit sphere whose corresponding hyperplane separates the two vectors.

import numpy as np
import plotly.express as px
import plotly.io as pio
import plotly.offline as py

pio.renderers.default = "notebook"

v = np.array([[1,0,3], [-1,0,3]])
v = v/np.linalg.norm(v, 2, axis=1)[:, np.newaxis]
n = 3000
np.random.seed(1)
x = np.random.normal(size=(n,3))
x = x / np.linalg.norm(x, 2, axis=1)[:, np.newaxis]
x = x[np.dot(x,v[0])*np.dot(x,v[1]) <0, :]
print(f'v1 = {np.round(v[0],3)}, \nv2 = {np.round(v[1],3)}')
print(f'arccos(v1,v2)/pi = {np.round(np.arccos(v[0] @ v[1].T)/np.pi,3)}')
print(f'simulated result = {np.round(len(x)/n,3)}')
fig = px.scatter_3d(x=x[:,0], y=x[:,1], z=x[:,2], size=np.ones(len(x)), range_z=[-1,1])
fig.add_scatter3d(x=[0, v[0,0]], y=[0, v[0,1]], z=[0, v[0,2]], name='v1')
fig.add_scatter3d(x=[0, v[1,0]], y=[0, v[1,1]], z=[0, v[1,2]], name='v2')
fig.show()
v1 = [0.316 0.    0.949], 
v2 = [-0.316  0.     0.949]
arccos(v1,v2)/pi = 0.205
simulated result = 0.208

Besides, how large can the SDP relaxation value \(\operatorname{SDP} (\boldsymbol{W})\) be? Grothendieck’s Inequality says

\[ K \operatorname{cut}(\boldsymbol{W}) \ge \operatorname{SDP}(\boldsymbol{W}) \]

where \(K \approx 1.7\). Hence, the SDP relaxation \(\Omega_{SDP}\) does not relax the original domain \(\Omega\) too much (otherwise we may see \(\operatorname{SDP}(\boldsymbol{W}) \gg \operatorname{cut}(\boldsymbol{W})\)). Hence \(\hat{\boldsymbol{v}}_i ^{\top} \boldsymbol{\hat{v}} _j\) should recover binary \(x_i^* x_j^*\) well.

For SBM

The above inequalities applies to any problem instance \(G=(V, E, \boldsymbol{W})\). It may give too generous or useless guarantee for some particular model. Let’s see its performance in stochastic block models.

We work with the mean-shifted matrix \(\boldsymbol{B} = 2\boldsymbol{A} - \boldsymbol{1} \boldsymbol{1} ^{\top}\), where

\[\begin{split} b_{ij} = \left\{\begin{array}{ll} 1, & \text { if } a_{ij}=1 \\ -1, & \text { if } a_{ij}=0 \end{array}\right. \end{split}\]

Essentially, \(\boldsymbol{B}\) just re-codes the connectivity in \(G\) from 1/0 to 1/-1. Note that in SBM, \(\boldsymbol{B}\) is random, depending on parameters \(p\) and \(q\). In the perfect case, if \(p=1, q=0\), then we can tell cluster label \(\boldsymbol{x}\in \left\{ \pm 1 \right\}^n\) directly from \(\boldsymbol{B}\), which can be expressed exactly as \(\boldsymbol{B} = \boldsymbol{x} \boldsymbol{x} ^{\top}\). In general cases, \(\boldsymbol{B}\) cannot be expressed as \(\boldsymbol{x} \boldsymbol{x} ^{\top}\) for some \(\boldsymbol{x} \in \left\{ \pm 1 \right\}^n\). We in turn want to find some \(\boldsymbol{X} = \boldsymbol{x} \boldsymbol{x} ^{\top}\) that is close enough to \(\boldsymbol{B}\), and then use \(\boldsymbol{x}\) as the approximated cluster label.

Similar to the max-cut case, we apply SDP relaxation that drops the rank-1 constraint to \(\boldsymbol{X}\). The SDP problem is then

\[ \max\ \operatorname{tr}\left( \boldsymbol{B} \boldsymbol{X} \right) \qquad \text{s.t. } \boldsymbol{X} \succeq 0, X_{ii}=1 \]

Note \(\operatorname{tr}\left( \boldsymbol{B} \boldsymbol{X} \right) = \langle \boldsymbol{B} , \boldsymbol{X} \rangle = \sum_{i,j}^n b_{ij} x_{ij}\). Next, we show that the solution to the above problem \(\hat{\boldsymbol{X}}\) is exactly rank-1, even we’ve dropped the rank-1 constraint.

.

.

.

.

.

.

.

.